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Pricing  American  Options:  A  Duality  Approach^ 


Martin  B.  Haugh^  and  Leonid  Kogan* 


Abstract 

We  develop  a  new  method  for  pricing  American  options.  The  main  practical  contribution 
of  this  paper  is  a  general  algorithm  for  constructing  upper  and  lower  bounds  on  the  true 
price  of  the  option  using  any  approximation  to  the  option  price.  We  show  that  our 
bounds  are  tight,  so  that  if  the  initial  approximation  is  close  to  the  true  price  of  the 
option,  the  bounds  are  also  guaranteed  to  be  close.  We  also  explicitly  characterize  the 
worst-case  performance  of  the  pricing  bounds.  The  computation  of  the  lower  bound 
is  straightforward  and  relies  on  simulating  the  suboptimal  exercise  strategy  implied  by 
the  approximate  option  price.  The  upper  bound  is  also  computed  using  Monte  Carlo 
simulation.  This  is  made  feasible  by  the  representation  of  the  American  option  price  as  a 
solution  of  a  properly  defined  dual  minimization  problem,  which  is  the  main  theoretical 
result  of  this  paper.  Our  algorithm  proves  to  be  accurate  on  a  set  of  sample  problems 
where  we  price  call  options  on  the  maximum  and  the  geometric  mean  of  a  collection 
of  stocks.  These  numerical  results  suggest  that  our  pricing  method  can  be  successfully 
applied  to  problems  of  practical  interest. 
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1     Introduction 

Valuation  and  optimal  exercise  of  American  options  remains  one  of  the  most  challeng- 
ing practical  problems  in  option  pricing  theory.  The  computational  cost  of  traditional 
valuation  methods,  such  as  lattice  and  tree-based  techniques,  increases  rapidly  with  the 
number  of  underlying  securities  and  other  payoff-related  variables.  Due  to  this  well- 
known  curse  of  dimensionality,  practical  applications  of  such  methods  are  limited  to 
problems  of  low  dimension. 

In  recent  years,  several  methods  have  been  proposed  to  address  this  curse  of  dimen- 
sionality. Instead  of  using  traditional  deterministic  approaches,  these  methods  use  Monte 
Carlo  simulation  to  estimate  option  prices.  Tilley  (1993)  was  the  first  to  demonstrate 
that  American  options  could  be  priced  using  simulation  techniques.  Other  important 
work  in  this  literature  includes  Barraquand  and  Martineau  (1995),  Carriere  (1996),  Ray- 
mar  and  Zwecher  (1997),  Ibanez  and  Zapatero  (1999)  and  Garcia  (1999).  Longstaff  and 
Schwartz  (2001)  and  Tsitsiklis  and  Van  Roy  (1999,  2000)  have  proposed  an  approximate 
dynamic  programming  approach  that  can  compute  good  price  estimates  and  is  very  fast 
in  practice.  Tsitsiklis  and  Van  Roy  also  provide  theoretical  results  that  help  explain  the 
success  of  approximate  dynamic  programming  methods.  The  price  estimates  these  tech- 
niques construct,  however,  typically  only  give  rise  to  lower  bounds  on  the  true  option 
price.  As  a  result,  there  is  usually  no  formal  method  for  evaluating  the  accuracy  of  the 
price  estimates. 

In  an  important  contribution  to  the  literature,  Broadie  and  Glasserman  ( 1997a, b) 
develop  two  stochastic  mesh  methods  for  American  option  pricing.  One  of  the  advan- 
tages of  their  procedure  over  the  previously  proposed  methods  is  that  it  allows  them  to 
generate  both  lower  and  upper  bounds  on  the  option  price  that  converge  asymptotically 
to  the  true  option  price.  Their  bounds  are  based  on  an  application  of  Jensen's  inequal- 


ity  and  can  be  evaluated  by  Monte  Carlo  simulations.  However,  such  bounds  do  not 
necessarily  generalize  to  other  pricing  methods.  The  complexity  of  their  first  method  is 
exponential  in  the  number  of  exercise  periods.  The  second  approach  does  not  suffer  from 
this  drawback  but  nonetheless  appears  to  be  computationally  demanding.  In  an  effort 
to  address  this  drawback,  Boyle,  Kolkiewicz  and  Tan  (2001)  generalize  the  stochastic 
mesh  method  of  Broadie  and  Glasserman  (1997b)  using  low  discrepancy  sequences  to 
improve  the  efficiency  of  the  approach. 

The  main  practical  contribution  of  this  paper  is  a  general  algorithm  for  constructing 
upper  and  lower  bounds  on  the  true  price  of  the  option  using  any  approximation  to  the 
option  price.  We  show  that  our  bounds  are  tight,  so  that  if  the  initial  approximation 
is  close  to  the  true  price  of  the  option,  the  bounds  are  also  guaranteed  to  be  close.  In 
addition,  we  exphcitly  characterize  the  worst-case  performance  of  the  pricing  bounds. 
The  computation  of  the  lower  bound  is  straightforward  and  relies  on  simulating  the 
suboptimal  exercise  strategy  implied  by  the  approximate  option  price.  The  upper  bound 
is  obtained  by  simulating  a  different  stochastic  process  that  is  determined  by  choosing  an 
appropriate  supermartingale.  We  justify  this  procedure  by  representing  the  American 
option  price  as  a  solution  of  a  dual  minimization  problem,  which  is  the  main  theoretical 
result  of  this  paper. 

In  order  to  determine  the  option  price  approximation  underlying  the  estimation  of 
bounds,  we  also  implement  a  fast  and  accurate  valuation  method  based  on  approximate 
dynamic  programming  (see  Bertsekas  and  Tsitsiklis  1996)  where  we  use  non-linear  re- 
gression techniques  to  approximate  the  value  function.  Unlike  most  procedures  that  use 
Monte  Carlo  simulation  to  estimate  the  continuation  value  of  the  option,  our  method  is 
deterministic  and  relies  on  low  discrepancy  sequences  as  an  alternative  to  Monte  Carlo 
simulation.    For  the  examples  considered  in  this  paper,  we  find  that  low  discrepancy 


sequences  provide  a  significant  computational  improvement  over  simulation. 

While  the  duality-based  approach  to  portfolio  optimization  problems  has  proved 
successful  and  is  now  widely  used  in  finance,  (see,  for  example,  Karatzas  and  Shreve 
1998),  the  dual  approach  to  the  American  option  pricing  problem  does  not  seem  to  have 
been  previously  developed  other  than  in  recent  independent  work  by  Rogers  (2001). 
Rogers  establishes  a  dual  representation  of  American  option  prices  similar  to  ours  and 
applies  the  new  representation  to  compute  upper  bounds  on  several  types  of  American 
options  using  Monte  Carlo  simulation.  However,  he  does  not  provide  a  formal  systematic 
approach  for  generating  tight  upper  bounds  and  his  computational  approach  is  problem 
specific. 

Andersen  and  Broadie  (2001)  use  the  methodology  developed  in  this  paper  to  for- 
malulate  another  computational  algorithm  based  on  Monte  Carlo  simulation.  A  dis- 
tinguishing feature  of  their  approach  is  the  use  of  an  approximate  exercise  policy,  as 
opposed  to  an  approximate  option  price,  to  estimate  the  bounds  on  the  true  price  of  the 
option.  They  also  observe  that  a  straightforward  modification  (taking  the  martingale 
part  of  the  supermartingale)  of  the  stochastic  process  used  to  estimate  the  upper  bound 
leads  to  more  accurate  estimates.  This  observation  also  applies  to  the  algorithm  we  use 
in  this  paper  where  we  begin  with  an  initial  approximation  to  the  option  price.  This 
led  to  a  significant  improvement  in  the  computational  results  that  were  presented  in  an 
earlier  draft  of  this  paper. 

The  algorithm  of  Andersen  and  Broadie  is  quadratic  in  the  number  of  exercise  pe- 
riods, given  knowledge  of  the  approximate  exercise  policy,  while  our  approach  is  linear 
given  knowledge  of  the  approximate  option  price.  While  it  is  true  that  formal  complexity 
results  are  not  currently  available,  it  does  seem  to  be  the  case  that  accurate  approxi- 
mations to  the  option  price  can  be  computed  very  quickly.   (See,  for  example,  Longstaff 


and  Schwartz  2001,  Tsitsiklis  and  Van  Roy  2000,  and  other  approximate  dynamic  pro- 
gramming methods  such  as  the  method  used  in  this  paper.)  As  a  result,  we  beheve 
that  pricing  methods  based  on  an  initial  approximation  to  the  option  price,  rather  than 
the  exercise  policy,  will  be  more  efficient  in  practice.  This  should  hold  in  particular  for 
problems  with  many  exercise  problems.  For  a  particular  set  of  sample  problems  with  10 
exercise  dates,  Andersen  and  Broadie  compute  pricing  bounds  that  are  similar  to  ours. 
However,  they  do  not  report  results  for  problems  with  more  than  10  exercise  periods  and 
so  it  is  difficult  to  compare  the  practical  tradeoffs  between  our  alternative  approaches. 
The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  formulate  the  problem. 
In  Section  3,  we  derive  the  new  duality  result  for  American  options  and  use  it  to  derive 
an  upper  bound  on  the  option  price.  In  Section  4  we  describe  the  implementation  of  the 
algorithm.  We  report  numerical  results  in  Section  5  and  we  conclude  in  Section  6. 

2     Problem  Formulation 

In  this  section  we  formulate  the  American  option  pricing  problem. 
Information  Set.  We  consider  an  economy  with  a  set  of  dynamically  complete  financial 
markets,  described  by  the  underlying  probability  space,  il,  the  sigma  algebra  T,  and  the 
risk-neutral  valuation  measure  Q.  It  is  well  known  (see  Harrison  and  Kreps  1979)  that 
if  financial  markets  are  dynamically  complete,  then  under  mild  regularity  assumptions 
there  exists  a  unique  risk-neutral  measure,  allowing  one  to  compute  prices  of  all  state- 
contingent  claims  as  the  expected  value  of  their  discounted  cash  flows.  The  information 
structure  in  this  economy  is  represented  by  the  augmented  filtration  {Tt  '■  t  €  [0.7"]}. 
More  specifically,  we  assume  that  Tt  is  generated  by  Z(,  a  d-dimensional  standard  Brow- 
nian  motion,  and  the  state  of  the  economy  is  represented  by  an  .F(-adapted  Markovian 
process  {Xt  e  ^  -.t   €  [0,r]}. 


Option  Payoff.  Let  /i,  =  /j(A',)  be  a  nonnegative  adapted  process  representing  the 
payoff  of  the  option,  so  that  the  holder  of  the  option  receives  /i,  if  the  option  is  exercised 
at  time  t.  We  also  define  a  riskless  account  process  Bt  =  exp  (  j^ /-^c/s- j,  where  r^ 
denotes  the  instantaneously  risk-free  rate  of  return.  We  assume  that  the  discounted 
payoff  processes  satisfies  the  following  integrability  condition 


max 

(=0,1,. ..,T 


(1) 


where  E,  [■]  denotes  the  expected  value  under  the  risk-neutral  probability  measure,  con- 
ditional on  the  time  t  information,  JF,. 

Exercise  Dates.  The  American  feature  of  the  option  allows  its  holder  to  exercise  it 
at  any  of  the  pre-specified  exercise  dates  in  T  =  {0, 1, . . . ,  T},  equally  spaced  between  0 
and  T.  Equal  spacing  of  the  exercise  dates  is  assumed  to  simplify  the  notation  and  is 
not  restrictive.  Moreover,  a  unit  time  increment  within  the  model  can  be  mapped  into 
any  period  of  calendar  time. 

Option  Price.  The  value  process  of  the  American  option,  Vt,  is  the  price  process  of 
the  option  conditional  on  it  not  having  been  exercised  before  t.  It  satisfies 


Vt  =  sup  Et 


Bthr 


Br 


(2) 


where  r  is  any  stopping  time  with  values  in  the  set  Tn[t,T].  This  is  a  well-known 
characterization  of  American  options  (see,  for  example,  Bensoussan  1984,  Karatzas  1988 
and  Pliska  1997). 

3     Theory 

Our  approach  to  the  American  option  pricing  problem  consists  of  the  following  steps. 
Step  1.  Compute  an  approximation  to  the  market  price  of  the  option  as  a  function  of  the 


time  and  state.  Specifically,  we  use  an  approximate  dynamic  programming  algorithm  to 

determine  the  continuation  value  of  the  option,  i.e.,  the  value  of  the  option  conditional 

on  not  exercising  it  at  the  current  time  period. 

Step  2.   Estimate  the  lower  bound  on  the  option  price  by  simulating  the  approximate 

exercise  strategy  based  on  the  option  price  approximation  from  Step  1. 

Step  3.  Based  on  the  option  price  approximation,  define  a  martingale  process  and  use 

it  to  estimate  the  upper  bound  by  Monte  Carlo  simulation.   This  last  step  is  based  on 

the  new  dual  representation  of  the  option  price  presented  below. 

We  begin  this  section  with  our  main  theoretical  result  on  the  dual  representation 
of  the  American  option  price.  We  then  show  how  to  use  this  price  characterization  to 
compute  bounds  on  the  option  price,  and  study  the  properties  of  these  bounds. 

3.1      The  Dual  Problem 

The  problem  of  pricing  an  American  option,  the  primal  problem,  is  that  of  computing 


Vo  =  sup  Eo 
rer 


For  an  arbitrary  adapted  supermartingale,  iTt,  we  define  the  dual  function  F{t,  it)  as 

+t7r  (3) 
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Then  the  dual  problem  is  to  minimize  the  dual  function  at  time  0  over  all  supermartin- 
gales,  TTj.  Let  Uq  denote  the  optimal  value  of  the  dual  problem,  so  that 


Uo  =  inf  F(0,  tt)  =  inf  Eq 


max     — —  TT, 


+^  (D) 


The  following  theorem  shows  that  the  optimal  values  of  the  dual  and  primal  problems 
coincide. 


Theorem  1  (Duality  Relation)  The  optimal  values  of  the  primal  problem  (P)  and 
the  dual  problem  (D)  are  equal,  i.e.,  Vq  =  f/o-  Moreover,  an  optimal  solution  of  the  dual 
problem  is  given  by  tTj*  =  Vt/Bt,  where  Vt  is  the  value  process  for  the  American  option, 

'hrB,' 
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Proof  For  any  supermartingale  tt;, 
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where  the  first  inequahty  follows  from  the  optional  sampling  theorem  for  supermartin- 
gales  (see  Billingsley  1995)  and  condition  (1).  Taking  the  infimum  over  all  supermartin- 
gales,  TTf,  on  the  right  hand  side  of  (4)  implies  Vq  <  f/o.  On  the  other  hand,  the  process 
Vt/Bt  is  a  supermartingale,  which  implies 


Uo<Eo    max  {ht/Bt -Vt/Bt)    +Vo. 
teT 


Since  Vt  >  ht  for  all  t,  we  conclude  that  Uq  <  Vq.  Therefore,  Vq  =  Uo,  and  equality  is 
attained  when  tt^*  =  V/Bt.    ■ 

Theorem  1  shows  that  an  upper  bound  on  the  price  of  the  American  option  can  be 
constructed  simply  by  evaluating  the  dual  function  over  an  arbitrary  supermartingale 
TTt-  In  particular,  if  such  a  supermartingale  satisfies  nt  >  ht/Bt,  the  option  price  Vq  is 
bounded  above  by  tto.  Theorem  1  therefore  implies  the  following  well-known  character- 
ization of  the  American  option  price  (see,  for  example,  Phska  1997). 


Proposition  1   (Option  Price  Characterization)  The  discounted  option  price  pro- 
cess Vt/Bt  is  the  smallest  supermartingale  that  dominates  the  discounted  payoff  of  the 


option  at  all  exercise  periods. 

The  reverse  is  also  true,  i.e.,  one  can  use  Proposition  1  to  prove  Theorem  1.  Note 
that  since  both  processes  on  the  right-hand  side  of  (3)  are  supermartingales,  so  is  the 
discounted  dual  function  F{t,n)/Bt.  Clearly  F(i,  tt)  >  ht/Bt  for  any  t,  implying  that 
-^(0,  tt)  >  Vo  by  Proposition  1.  When  tt^  is  chosen  to  be  the  discounted  option  price,  the 
reverse  inequality  holds,  implying  that  the  values  of  the  primal  and  the  dual  problems 
coincide.  Theorem  1  therefore  expresses  the  well-known  result  of  Proposition  1  in  a  new 
constructive  form,  which  we  use  to  evaluate  the  upper  bound  on  the  option  price. 

The  pricing  problem  is  closely  related  to  the  problem  of  dynamic  replication  of  the 
American  option,  which  is  equally  important  in  practice.  While  various  methods  for 
approximating  American  option  prices  have  been  suggested  in  the  literature,  computing 
reliable  replication  strategies  has  remained  a  challenging  problem.  The  result  of  Theorem 
1  suggests  a  method  for  super-replicating  the  American  option. 

Since  the  discounted  dual  function  F{t,  7r)/Bt  is  a  supermartingale  and  financial 
markets  in  our  model  are  dynamically  complete,  there  exists  a  self-financing  trading 
strategy  with  an  initial  cost  F(0,  n)  which  almost  surely  dominates  F{t,  tt)  (see  Duffie 
1996,  Section  2.1).  Such  a  trading  strategy  super-replicates  the  payoff  of  the  American 
option,  since  F{t,  n)  dominates  the  price  of  the  option  and  hence  its  payoff  at  exercise. 
Using  an  approximation  to  the  option  price,  we  can  define  tt,  and  F(^7r)  explicitly,  so 
that  super-replicating  the  option  can  be  a  relatively  straightforward  task.  In  particular, 
Boyle  et  al.  (1997)  and  Garcia  et  al.  (2000)  describe  Monte  Carlo  methods  for  estimating 
the  portfolio  strategies  replicating  the  present  value  process  of  a  state  contingent  claim. 
This  claim  could  correspond  to  a  derivative  security  or  some  consumption  process.  Their 
results  are  directly  applicable  to  (3). 


3.2     The  Upper  Bound  on  the  Option  Price 

When  the  supermartingale  ni  in  (3)  coincides  with  the  discounted  option  value  process, 
Vt/Bt,  the  upper  bound  F(0,  tt)  equals  the  true  price  of  the  American  option.  This 
suggests  that  a  tight  upper  bound  can  be  obtained  from  an  accurate  approximation,  Vt, 
by  defining  TTf  in  such  a  way  that  when  the  approximate  option  price,  V,,  coincides  with 
the  exact  price,  V(,  tt^  equals  the  discounted  option  price,  Vt/ Bt-  It  seems  natural  then 
to  use  either  of  the  following  two  definitions  of  ttj: 

(5) 
(6) 

(7) 

where  (x)"*"  :=  max(x,0).  By  construction,  E^  [ttj+i  —  (Tt  ]  <  0  for  either  definition  of 
TT,,  implying  tt^  is  an  adapted  supermartingale  for  any  approximation,  Vj.  Also,  when 
Vt  =  Vf,  Tit  =  Vt/Bt,  because  the  latter  process  is  a  supermartingale  and  the  positive 
part  of  expectations  in  (5,6)  and  (5,7)  equals  zero.  While  we  cannot  claim  a  priori  that 
the  upper  bound  corresponding  to  (5,6)  is  superior  to  the  bound  determined  by  (5,7), 
the  properties  of  the  bound  under  the  first  definition  are  considerably  easier  to  analyze. 
Note  also  that  the  upper  bound  can  be  tightened  further  by  omitting  the  positive 
part  in  the  definition  of  tt;.  The  resulting  process  is  a  martingale  (the  martingale  part 
of  TTf)  and  therefore  a  supermartingale,  so  that  it  too  can  be  used  to  construct  an  upper 
bound.  It  coincides  with  nt  at  t  =  0  and  is  always  greater  than  or  equal  to  nt  for  t  >  0. 
It  therefore  leads  to  a  lower  value  of  the  upper  bound  defined  by  (3).  (In  an  earlier  draft 
of  this  paper,  we  used  the  formulation  (5,6)  to  compute  upper  bounds  on  the  option 


Tfo  =  Vo 

TTf+l   =  TTj  + 

Vt 
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price.  Andersen  and  Broadie  (2001)  point  out,  however,  that  in  general  tighter  upper 
bounds  can  be  obtained  by  using  the  martingale  component  of  the  supermartingale,  tt^. 
In  our  framework,  the  martingale  component  is  obtained  by  simply  omitting  the  positive 
in  (5,6)). 

For  the  remainder  of  the  paper  we  will  therefore  take  ttj  to  be  defined  by 


TTo  =  Vo 


V„ 


T^t  +  l   =  TT,  + -5-  -  E( 

•D(+l  -Df 


Vt+i       Vt 
Bt+i      Bt 


(8) 
(9) 


Let  V'o  denote  the  upper  bound  corresponding  to  (8)  and  (9).  Then  it  is  easy  to  see  that 
the  upper  bound  is  explicitly  given  by 


Fn  =  Vn  +  En 


max  (    -;:; -^  +    y      tjj. 


t€T    \Bt         Bt 


Bj      Bj_i 


(10) 


The  following  theorem  relates  the  worst-case  performance  of  the  upper  bound  deter- 
mined by  (8)  and  (9)  to  the  accuracy  of  the  original  approximation,  Vf. 

Theorem  2   (Tightness  of  the  Upper  Bound)  Consider  any  approximation  to  the 
American  option  price,  Vt,  satisfying  Vt  >  hf.   Then 


Ko  <  K)  +  2  Eo 


Bt 


;ii) 


Proof  See  Appendix  A.l.    ■ 

Theorem  2  suggests  two  possible  reasons  for  why  the  upper  bound  may  be  limited 
in  practice.  First,  it  suggests  that  the  bound  may  deteriorate  linearly  in  the  number  of 
exercise  periods.   However,  this  is  a  worst  case  bound.   Indeed,  the  quantity  of  interest 


Eo 


max  \  — —  TIT  +  2_] Ej- 


Tif'  \  Bt       Bt 
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V      V 


J-1 


5,      B 


j-i 


(12) 


and  while  we  would  expect  it  to  increase  with  the  number  of  exercise  periods,  it  is  not 
clear  that  it  should  increase  linearly.  This  is  particularly  true  for  pricing  American 
options  when  we  typically  want  to  keep  the  horizon,  T,  fixed,  while  we  let  the  number 
of  exercise  times  in  [0,  T]  increase. 

The  second  reason  is  due  to  the  approximation  error,  \Vt  —  Vt\/Bt.  Tsitsiklis  and 
Van  Roy  (2000)  have  shown  that  under  certain  conditions,  and  for  certain  approximate 
dynamic  programming  algorithms,  this  error  can  be  bounded  above  by  a  constant,  in- 
dependent of  the  number  of  exercise  periods.  This  result,  however,  is  only  applicable 
to  perpetual  options  since  it  assumes  that  T  -^  oo  while  the  interval  between  exercise 
times  remains  constant.  As  mentioned  in  the  previous  paragraph,  however,  we  are  typ- 
ically interested  in  problems  where  T  is  fixed  and  the  interval  between  exercise  periods 
decreases.  In  this  case,  Tsitsiklis  and  Van  Roy  show  that  the  approximation  error  is 
bounded  above  by  a  constant  times  \/N,  where  A''  is  the  number  of  exercise  periods. 

These  two  observations  suggest  that  the  quality  of  the  upper  bound  should  deteriorate 
with  A^,  but  not  in  a  linear  fashion.  In  Section  5  we  shall  see  evidence  to  support  this 
when  we  successfully  price  options  with  as  many  as  100  exercise  periods. 

3.3     The  Lower  Bound  on  the  Option  Price 

To  construct  a  lower  bound  on  the  true  option  price,  we  define  the  Q-value  function  to 
be 

"  Bt 


Qt(Xt)  :=  Et 


R      Vt+AXt+i) 

■Dt+l 


(13) 


The  Q-value  at  time  t  is  simply  the  expected  value  of  the  option,  conditional  on  it  not 
being  exercised  at  time  t.  Suppose  that  an  approximation  to  the  Q-value,  Qt{Xt),  is 
available,  iov  t  =  1, ...  ,T  —  1.  Then,  to  compute  the  lower  bound  on  the  option  price, 
we  simulate  a  number  of  sample  paths  originating  at  A'q.  For  each  sample  path,  we  find 
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the  first  exercise  period  t,  if  it  exists,  in  which  h(Xt)  >  Qt{Xt).  The  option  is  then 
exercised  at  this  time  and  the  discounted  payoff'  of  the  path  is  given  by  h{Xt)/ Bt-  Since 
this  is  a  feasible  Tt  -  adapted  exercise  poHcy,  it  is  clear  that  the  expected  discounted 
payoff  from  following  this  policy  defines  a  lower  bound,  Vq,  on  the  true  option  price,  Vq. 
Formally,    f  =  min{f  E  T  :  Qt  <  ht}  and 


The  following  theorem  characterizes  the  worst-case  performance  of  the  lower  bound. 


Theorem  3   (Tightness  of  the  Lower  Bound)  The  lower  bound  on  the  option  price 
satisfies 


Vo  >  Vo  -  Eo 


y-  \Qt-  ' 


(14) 


Proof  See  Appendix  A. 2.    ■ 

While  this  theorem  suggests  that  the  performance  of  the  lower  bound  may  deteriorate 
linearly  in  the  number  of  exercise  periods,  numerical  experiments  indicate  that  this  is 
not  the  case.  Theorem  3  describes  the  worst  case  performance  of  the  bound.  However, 
in  order  for  the  exercise  strategy  that  defines  the  lower  bound  to  achieve  the  worst  case 
performance,  it  is  necessary  that  at  each  exercise  period  the  option  is  mistakenly  left  un- 
exercised, i.e.,  the  condition  Qt{Xt)  <  h{Xt)  <  Qt{Xt)  is  satisfied.  For  this  to  happen, 
it  must  be  the  case  that  at  each  exercise  period,  the  underlying  state  variables  are  close 
to  the  optimal  exercise  boundary.  In  addition,  Qt  must  systematically  overestimate  the 
true  value  Qt  so  that  the  option  is  not  exercised  while  it  is  optimal  to  do  so.  In  practice, 
the  variability  of  the  underlying  state  variables,  Xt,  might  suggest  that  Xt  spends  little 
time  near  the  optimal  exercise  boundary.  This  suggests  that  as  long  as  Qt  is  a  good 
approximation  to  Qt  near  the  optimal  exercise  frontier,  the  lower  bound  should  be  a 
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good  estimate  of  the  true  price,  regardless  of  the  number  of  exercise  periods. 

4     Implementation 

In  this  section  we  describe  in  some  detail  approximate  Q-value  iteration,  the  algorithm 
that  we  use  for  obtaining  the  initial  approximation  to  the  value  function,  Vj.  Algorithms 
of  this  kind  are  now  standard  in  the  approximate  dynamic  programming  literature  (see 
for  example,  Bertsekas  and  Tsitsiklis,  1996).  An  interesting  feature  of  the  particular 
algorithm  we  describe  is  that,  in  contrast  to  most  approximate  dynamic  programming 
algorithms,  it  is  deterministic.  This  deterministic  property  is  achieved  through  the 
use  of  low  discrepancy  sequences.  While  such  sequences  are  used  in  the  same  spirit  as 
independent  and  identically  distributed  sequences  of  random  numbers,  we  found  that 
their  application  significantly  improved  the  computational  efficiency  of  the  algorithm. 
They  are  described  in  further  detail  in  Appendix  B. 

4.1      Q- Value  Iteration 

As  before,  the  problem  is  to  compute 


Vo  =  sup  Eo 

T6T 


In  theory  this  problem  is  easily  solved  using  value  iteration  where  we  solve  for  the  value 
functions,  Vt,  recursively  so  that 

Vt  =  HXt)  (15) 


Vt  =  max     h{X,),E 


(16) 


The  price  of  the  option  is  then  given  by  V'o(Xo)  where  Xq  is  the  initial  state  of  the 
economy.    In  practice,  however,  if  d  is  large  so  that  Xt  is  high  dimensional,  then  the 
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'curse  of  dimensionality'  implies  that  value  iteration  is  not  practical. 

As  an  alternative  to  value  iteration  consider  again  the  Q-value  function,  which  was 
defined  earlier  to  be  the  continuation  value  of  the  option 


QtiXt)  =  E, 


Bt+i 


Vt+i{Xt+,] 


The  value  of  the  option  at  time  t  +  1  is  then 


(17) 


Vt+i{Xt+,)  =  miix{h(Xt+i).Qt+AXt+,)) 


(18) 


so  that  we  can  also  write 


QtiXt 


Bt 

Bt+i 


msLx{h{Xt+i),Qt+i{Xt+i)) 


(19) 


Equation  (19)  clearly  gives  a  natural  extension  of  value  iteration  to  Q-value  iteration. 
The  algorithm  we  use  in  this  section  consists  of  performing  an  approximate  Q-value 
iteration. 

There  are  a  number  of  reasons  for  why  it  is  preferable  to  concentrate  on  approximat- 
ing Qt  rather  than  approximating  Vt  directly.  Letting  Qt  and  Vt  denote  our  estimates 
of  Qt  and  Vt  respectively,  we  can  write  the  defining  equations  for  approximate  Q-value 
and  value  iteration  as  follows: 


QtiXt)  =  E, 


Bt 


max{hiXt+,),Qt+i{Xt+:, 


Vt{Xt)  =  nmx{h(X,),E 


Bt 

Bt+i 


Vt+dXt^i] 


(20) 
(21) 


The  functional  forms  of  (20)  and  (21)  suggest  that  Qt  is  smoother  than  14,  and  therefore 
more  easily  approximated. 

More  importantly,  however,  Qt  is  the  unknown  quantity  of  interest,  and  the  decision 
to  exercise  or  continue  at  a  particular  point  will  require  a  comparison  of  Qt{Xt)  and 
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h{Xt).  If  we  only  have  Vt  available  to  us  then  such  a  comparison  will  often  be  very 
difficult  to  make.  For  example,  if  Vt{Xi)  >  h{Xt)  then  we  do  not  exercise  the  option. 
However,  if  Vt{Xt)  is  only  marginally  greater  than  h{Xt),  then  it  may  be  the  case  that 
h{Xt)  >  Qt{Xt)  and  Vt{Xi)  is  actually  attempting  to  approximate  h{Xt).  In  this  situ- 
ation, we  misinterpret  Vi{Xt)  and  assume  that  it  is  optimal  to  continue  when  in  fact  it 
is  optimal  to  exercise.  This  problem  can  be  quite  severe  when  there  are  relatively  few 
exercise  periods  because  in  this  case,  there  is  often  a  significant  difference  between  the 
value  of  exercising  now  and  the  continuation  value.  When  we  have  a  direct  estimate  of 
Qt{Xt)  available,  this  problem  does  not  arise. 

4.2     Approximate  Q- Value  Iteration 

The  first  step  in  approximate  Q-value  iteration  is  to  select  an  approximation  architecture, 
\  Qt{-',  Pt)  :  /3t  e  5R^  [,  which  is  a  class  of  functions  from  which  we  select  Qt.  This  class 
is  parametrized  by  the  vector  Pt  £  5?^  so  that  the  problem  of  determining  Qt  is  reduced 
to  the  problem  of  selecting  Pt  where  pt  is  chosen  to  minimize  some  approximation  error. 
The  choice  of  architecture  does  not  seem  to  be  particularly  important  as  long  as  it  is 
sufficiently  'rich'  to  accurately  approximate  the  true  value  value. 

Possible  architectures  are  the  linearly  parametrized  architectures  of  Longstaff  and 
Schwartz  (2000)  and  Tsitsiklis  and  Van  Roy  (2000),  or  non-linearly  parametrized  archi- 
tectures such  as  neural  networks  or  support  vector  machines  (see  Vapnik  1999).  In  this 
paper  we  use  a  multi-layered  perceptron  with  a  single  hidden  layer,  a  particular  class 
of  neural  networks.  Multi-layered  perceptrons  with  a  single  hidden  layer  are  known 
to  possess  the  universal  approximation  property  so  that  they  are  able  to  approximate 
any  continuous  function  over  a  compact  set  arbitrarily  well,  provided  that  a  sufficient 
number  of  neurons  are  used  (see  Hornick,  Stinchcombe  and  White  1989). 
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The  second  step  in  the  procedure  is  to  select  for  each  t  =  I, . . . ,  K  —  I,  a  set 

Sr.=  {Pl.--.Ph,} 

of  training  points  where  P,[  G  JR*^  for  n  =  1, .  . . ,  A^j.  It  makes  sense  to  choose  the  sets 
St  in  such  a  way  that  they  are  representative  of  the  probabihty  distribution  of  Xt-  We 
do  this  using  low  discrepancy  sequences  so  that  if  Nt  is  the  desired  number  of  training 
points,  then  we  simply  take  Nt  points  from  a  particular  low  discrepancy  sequence.  Using 
the  technique  described  in  Appendix  B,  it  is  then  straightforward  to  convert  these  points 
into  training  points  that  represent  the  distribution  of  Xt-  Of  course,  it  is  also  possible  to 
select  the  training  points  by  simply  simulating  from  the  distribution  of  the  state  vector, 
A'(.  Our  Umited  experience  shows  that  both  simulation  and  low  discrepancy  sequences 
work  very  well  in  practice.  The  performace  of  the  low  discrepancy  sequences,  however, 
appeared  to  be  marginally  superior  when  applied  to  the  problems  we  consider  in  this 
paper. 

The  third  step  is  to  perform  a  training  point  evaluation.  Defining  Qt  =  0,  we  begin 
with  t  =  K  —  1  and  for  n  =  1, . . .  ,Nt,  we  use  Qt{Pn)  to  estimate  Qt{Pi)  where 


Qt{Pl)  ■■=  Et 


_Bt 


-  max  (h{Xt+i),  Qt+i{Xt+i) 
1  ^ 


(22) 


The  operator  E[  .  ]  in  (22)  is  intended  to  approximate  the  expectation  operator,  E[  .  ]. 
This  is  necessary  as  it  is  usually  not  possible  to  compute  the  expectation  exactly  on 
account  of  the  high  dimensionality  of  the  state  space.  For  example,  E[  .  ],  could  corre- 
spond to  Monte  Carlo  simulation  where  we  simulate  from  the  distribution  oi  Xt+i  given 
that  Xt  =  PI  Then  E[  .  ]  is  defined  by 

M 


Bt 
Bt+\ 


m&x(^h{Xt+i),Qt+i{Xt+i))     :=^^^|^5]max(M:i--(),4+i(-i^())         (23) 
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where  M  xis  are  drawn  randomly  from  the  conditional  distribution  of  Xt+\-  The  prob- 
lem with  this  method  is  that  the  rate  of  convergence  to  the  true  expectation  is  0(-^) 
which  can  be  too  slow  for  our  purposes.  Instead,  we  use  a  low  discrepancy  sequence  to 
generate  the  x;'s,  with  M  chosen  in  advance.  For  the  5-dimensional  problems  of  Section 
5,  M  was  set  equal  to  1000.  For  the  10-dimensional  problems,  we  set  M  equal  to  2000. 
Had  we  used  Monte  Carlo  sinuilation,  then  in  order  to  achieve  a  comparable  level  of 
accuracy,  a  substantially  larger  value  of  M  would  have  been  required. 
Finally,  we  estimate  Qt  with  Qt{,  \t3t)  where 

Q,(x;  A)  =  Y.r,{j)e  U(j)  +  J2  (^'O'-O^IO)  )  (24) 

and 

K  _  2 

^  =  &vgmmY,[Qt{K)-Q>{Pn-,Pt))    ■  (25) 

n  =  l 

The  parameters  fe((j),  rt{j)  and  r,(j, /)  in  (24)  constitute  the  parameter  vector  Pt,  while 
9{.)  denotes  the  logistic  function  so  that 

e{x)  =  —^ . 

K  refers  to  the  number  of  'neurons',  and  d  is  the  dimensionality  of  the  input  vector,  x. 
While  the  input  vector  x  may  simply  correspond  to  a  sample  state  vector,  it  is  common 
to  augment  x  with  certain  features,  that  is,  easily  computed  functions  of  the  current 
state  vector.  From  an  informational  point  of  view,  features  add  no  new  information  to 
the  neural  network.  However,  they  often  make  it  easier  for  the  neural  network  (or  other 
approximation  architecture)  to  approximate  the  true  value  function. 

In  practice,  we  usually  minimize  a  variant  of  the  quantity  in  (25)  in  order  to  avoid  the 
difficulties  associated  with  overfitting.  This  is  done  using  cross  validation,  an  approach 
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that  requires  the  training  points  to  be  divided  into  three  sets,  namely  training,  vahdation 
and  test  sets.  Initially,  only  the  training  and  validation  sets  are  used  in  the  minimization 
so  that  the  quantity 

Y,{Qt{P!)-Qt{P!;Pt)y'  (26) 

is  minimized  where  the  sum  in  (26)  is  taken  over  points  in  the  training  set.  The  minimiza- 
tion is  performed  using  the  Levenberg  Marquardt  method  for  least  squares  optimization 
(see  Bertsekas  and  Tsitsiklis  1996).  At  each  iteration  of  the  minimization,  the  error  in 
the  validation  set  is  also  computed  and  as  long  as  overfitting  is  not  taking  place,  then 
the  validation  error  should  decrease  along  with  the  training  set  error.  However,  if  the 
validation  error  starts  increasing  at  any  point  then  it  is  likely  that  overfitting  is  taking 
place.  The  algorithm  then  terminates  if  the  validation  error  increases  for  a  prespecified 
number  of  iterations,  and  pt  is  then  set  equal  to  the  value  of  pt  in  the  last  iteration  of 
the  minimization  before  the  validation  error  began  to  increase. 

There  is  one  further  difficulty  with  the  neural  network  architecture  that  needs  to  be 
addressed.  The  neural  network  will  typically  have  many  local  minima  and  it  is  often  the 
case  that  the  algorithm  will  terminate  at  a  local  minimum  that  is  far  from  the  global 
minimum.  In  this  case,  it  is  necessary  to  repeat  the  minimization  again,  this  time  using 
a  different  starting  value  for  pt.  This  may  be  repeated  until  a  satisfactory  local  minimum 
has  been  found,  where  'satisfactory'  refers  to  performance  on  the  test  set. 

We  use  this  training  algorithm  for  finding  Qt-\-  For  the  remaining  Q- value  functions, 
however,  the  problem  is  now  somewhat  simplified  since  it  is  usually  the  case  that  Qt  ~ 
Qt-\-  We  can  therefore  use  pt  as  the  initial  solution  for  training  the  time  t  —  1  neural 
network.  In  practice,  this  means  that  the  other  neural  networks  can  be  trained  very 
quickly  and  that  we  only  need  to  perform  the  minimization  once.  It  also  means  that  we 
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can  dispense  with  the  need  for  having  a  test  set  for  all  but  the  terminal  neural  network. 
Once  Qt  has  been  found,  we  then  iterate  in  the  manner  of  value  iteration  until  we 
have  found  Qq.  We  could  then  use 


Vo{Xo)  =  max{h{Xo),Qo{X, 


(27) 


to  estimate  the  value  of  the  option.  While  such  an  estimate  may  be  quite  accurate,  it  is 
of  limited  value  since  we  can  say  very  little  about  the  estimation  error.  In  addition,  we 
do  not  have  at  hand  an  exercise  strategy  that  has  expected  value  equal  to  V'o(Xo)  nor 
do  we  know  if  such  a  strategy  even  exists.  Finally,  it  provides  little  information  with 
regards  to  hedging  the  option. 

4.3     Computing  the  Upper  and  Lower  Bounds 

In  Section  3.1  we  showed  that  an  upper  bound  for  the  price  of  the  American  option 
is  given  by  (10).  However,  in  (10)  we  can  alternatively  set  Vq  =  Vq,  where  V^  is  the 
estimated  lower  bound.  (Since  Vq  >  ho,  this  new  definition  satisfies  %  >  ht  for  all  t, 
and  so  Theorem  2  continues  to  hold.)  In  this  case,  the  upper  bound  is  given  by 


Vn  =  Vn  +  En 


max  I  7^  -  1^  +  >     cjj- 
ter   \Bt       Bt      ^    ' 


K-        K 


j-i 


B,       B 


j-i 


(28) 


This  then  gives  a  natural  decomposition  of  the  upper  bound  into  a  sum  of  two  compo- 
nents. The  first  component  is  the  estimated  lower  bound,  while  the  second  component 
in  some  sense  measures  the  extent  to  which  the  discounted  approximate  value  function 
fails  to  be  a  supermartingale. 

We  estimate  Vq  by  simulating  sample  paths  of  the  state  variables,  evaluating 


Vt      V- 

max  I    -;::: T^  +  2_^  ^J" 


ter   I  Bt       Bt 


j=i 


Bj      Bj-i 


(29) 
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along  each  path  and  taking  the  average  over  all  paths.   Evaluating  (29)  numerically  is 
time  consuming  since  at  each  point,  {t,Xt),  on  the  path,  we  need  to  compute 


-j-i 


V,       V, 


j-i 


(30) 


Bj      Bj-i 

Any  unbiased  noisy  estimate  of  the  expectation  in  (30),  however,  will  result  in  an  upwards 
biased  estimate  of  the  upper  bound,  due  to  the  application  of  the  maximum  operator. 
It  is  therefore  important  that  accurate  estimates  of  the  expectation  in  (30)  can  be 
computed.  As  before,  we  again  use  low  discrepancy  sequences  to  estimate  these  high- 
dimensional  integrals.  Since  we  wish  to  compute  an  accurate  estimate  of  the  upper 
bound,  it  is  important  to  use  a  good  stopping  criterion  to  determine  the  number  of 
points  that  will  be  used  to  estimate  the  expectation  in  (30). 

Let  E(A^)  denote  the  estimate  of  (30)  when  A^  low  discrepancy  points  are  used.  For 
some  fixed  value  of  M,  we  then  examine  E{Mi)  for  i  =  1, . . . ,  L  and  terminate  either 
when 

|E(Af(z  +  l))-E(Mz)|  <e  (31) 

or  when  i  =  L,  if  the  condition  in  (31)  is  not  satsified  for  any  i  <  L. 

In  the  numerical  results  of  Section  5,  we  set  M  equal  to  2000  and  4000  for  the  5 
and  10-dimensional  problems,  respectively.  For  all  problems,  we  set  e  =  .2  cents  and 
L  —  48000.  These  values  typically  result  in  estimates  of  the  upper  bound  that  appear  to 
be  very  accurate  for  practical  purposes.  This  observation  is  based  upon  further  numerical 
experiments  where  M  and  e  were  increased  and  decreased  respectively.  The  results  of 
these  experiments  invariably  resulted  in  estimates  of  the  upper  bound  that  were  within 
1  or  2  cents  of,  and  often  considerably  closer  to,  the  original  estimate. 

One  possible  concern  about  the  use  of  low  discrepancy  sequences  for  estimating 
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the  upper  bound  is  that  even  a  small  bias  in  the  estimate  of  the  expectation  in  (30) 
might  result  in  a  considerable  bias  in  the  estimate  of  the  upper  bound  when  there  are 
many  exercise  periods.  This  is  not  a  problem,  however,  for  reasons  mentioned  before. 
In  particular,  as  the  number  of  exercise  periods  increases,  the  time  interval  between 
exercise  periods  decreases  which  means  that  the  variability  of  the  value  function  in  the 
next  period  also  decreases.  As  a  result,  the  conditional  expectation  in  (30)  can  be 
estimated  more  accurately  for  fixed  values  of  M  and  e. 

4.4     An  Automated  Pricing  Algorithm 

Perhaps  the  obvious  way  to  compute  the  lower  and  upper  bounds  is  in  a  sequential 
fashion  so  that  after  estimating  the  Q-value  functions,  we  simulate  a  number  of  sample 
paths  to  compute  the  lower  bound,  and  simulate  another  set  to  compute  the  upper 
bound.  One  difficulty  with  this  strategy,  however,  is  that  the  difference  between  V^  and 
Vq  might  be  significant  so  that  there  is  a  large  duality  gap. 

When  this  occurs,  we  are  forced  to  re-estimate  the  Q/s,  possibly  using  more  train- 
ing points  or  a  more  flexible  approximation  architecture.  This  process  may  need  to 
be  repeated  a  number  of  times  before  we  obtain  a  sufficiently  small  duality  gap.  For 
American  options  that  need  to  be  priced  regularly,  this  is  unlikely  to  be  a  problem  since 
experience  should  allow  us  to  determine  in  advance  the  appropriate  parameter  settings. 

However,  for  pricing  more  exotic  American  options,  we  might  need  to  use  such  an 
ad  hoc  approach.  This  might  be  very  inefficient  in  practice,  so  we  now  briefly  outline  a 
method  to  address  this  problem. 

We  have  already  mentioned  that  while  the  upper  bound  should  work  very  well  in 
practice,  the  lower  bound  should  still  be  superior.  This  observation  is  borne  out  in 
Section  5  when  we  price  call  options  on  the  geometric  mean  of  a  number  of  stocks.  For 
these  problems,  where  it  is  actually  possible  to  compute  option  prices  exactly,  we  see 
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that  the  lower  bound  is  closer  than  the  upper  bound  to  the  true  price. 

This  then  suggests  that  when  there  is  a  large  duality  gap  it  is  usually  because  the 
upper  bound  is  not  sufficiently  close  to  Vq.  The  expression  for  the  upper  bound,  as 
given  in  (28),  suggests  that  if  E(_i  V(/5(—  14-i/5(-i  tends  to  be  large,  then  the 
upper  bound  will  not  be  very  tight.  Based  on  this  observation,  we  propose  the  following 
modification  to  the  approximate  Q-value  iteration. 

After  Qt  has  been  computed,  we  do  not  proceed  directly  to  computing  Qt-\-  Instead, 
we  simulate  a  number  of  points  from  the  distribution  of  Xt  and  for  each  point,  we 
compute  E(  Vt+\/ Bt+\  —  V't/Bt  .  If  the  average  value  of  these  quantities  is  below  some 
threshold,  e^,  then  believing  that  we  have  a  good  estimate  of  l^,  we  proceed  to  estimate 
Qt-i-  Otherwise  we  re-estimate  Qt,  either  by  increasing  the  number  of  time  t  training 
points  or  by  refining  the  approximation  architecture,  depending  on  the  remedy  that 
seems  more  appropriate.  We  then  repeat  the  process  until  the  average  is  less  than  e(. 
The  resulting  estimates  of  the  Q-value  functions  should  lead  to  tight  lower  and  upper 
bounds. 

A  further  advantage  of  this  proposal  is  that  it  allows  us  to  determine  how  much 
computational  effort  is  required  to  obtain  a  good  solution.  In  particular,  we  can  now  de- 
termine online  how  many  training  points  are  needed  or  how  complex  the  approximation 
architecture  needs  to  be  in  order  to  obtain  good  estimates  of  the  option  price. 

5     Numerical  Results 

In  this  section  we  illustrate  our  methodology  by  pricing  call  options  on  the  maximum 
of  5  and  10  stocks  respectively,  and  the  geometric  mean  of  5  stocks.  We  do  not  present 
results  for  call  options  on  the  geometric  mean  of  10  stocks  since  this  problem  is  in  fact 
easier  to  solve  than  the  5  stock  case.  While  somewhat  counterintuitive,  this  is  explained 
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by  noting  that  the  volatility  of  the  geometric  mean  decreases  as  the  number  of  stocks 
increases. 

In  the  problems  that  follow,  we  assume  that  the  market  has  A'^  traded  securities  with 
price  processes  given  by 

dSl  =  Sl[{r  -  5,)dt  +,a  dZl\  (32) 

where  Z[  is  a  standard  Brownian  motion  and  the  instantaneous  correlation  of  Z]  and 
Z/  is  pij.  Each  security  pays  dividends  at  a  continuous  rate  of  (5j.  We  assume  that 
the  option  expires  at  time  T  and  that  there  are  n  equally  spaced  exercise  dates  in  the 
interval  [0,T].  The  first  date  occurs  at  i  =  0  which  we  call  the  P'  exercise  period  and 
the  n"'  exercise  date  occurs  at  i  =  T.  We  use  k  to  denote  the  strike  price  of  the  option 
and  let  r  be  the  annual  continuously  compounded  interest  rate.  It  is  assumed  that  r  is 
constant  though  this  assumption  is  easily  relaxed. 

In  order  to  generate  the  initial  approximation  to  the  option  price,  certain  problem 
specific  information  was  also  used.  For  example,  we  have  already  mentioned  feature 
extraction,  where  functions  of  the  current  state  vector  are  used  as  inputs  for  the  neural 
network.  In  all  the  problems  of  this  section,  we  found  it  advantageous  to  order  the 
stock  prices  before  using  them  as  inputs  to  the  neural  network.  In  addition,  we  used 
the  current  intrinsic  value  of  the  option  as  a  feature  for  options  on  the  maximum,  while 
for  options  on  the  geometric  mean,  the  corresponding  European  option  value  was  used. 
Though  it  is  true  in  general  that  the  exact  European  value  of  a  high-dimensional  option 
will  not  be  available,  this  is  not  important  since  it  is  known  (see  Hutchinson  et  al  (1994) 
that  European  options  prices  can  be  quickly  and  accurately  approximated  using  learning 
networks  .  (This  observation  also  suggests  another  way  of  implementing  the  approximate 
dynamic  programming  algorithm.    Instead  of  using  the  approximation  architecture  to 
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estimate  the  Q- value,  we  could  use  it  to  estimate  the  early  exercise  premium,  conditional 
on  not  exercising  the  option  at  the  current  exercise  period.  This  conditional  early  exercise 
premium  plus  the  estimated  European  option  price  might  give  a  more  accurate  estimate 
of  the  Q- value  function.) 

Another  method  by  which  problem  specific  information  was  used  is  policy  fixing  (see 
Broadie  and  Glasserman  1997).  We  know,  for  example,  that  the  American  option  price 
is  greater  than  or  equal  to  the  price  of  the  corresponding  European  option.  Such  infor- 
mation can  easily  be  incorporated  to  the  approximate  dynamic  programming  algorithm. 
We  do  this  by  simply  redefining  the  estimated  Q-value  to  be  the  maximum  of  the  es- 
timated value  function  one  time  period  ahead,  and  a  European  option  that  is  a  lower 
bound  on  the  Q-value.  For  options  on  the  geometric  mean,  we  again  use  the  corre- 
sponding European  option  value  to  bound  the  Q-value  from  below.  For  options  on  the 
maximum,  we  use  the  European  option  on  the  stock  that  is  most  'in  the  money'. 

The  initial  approximation  to  the  option  price  was  obtained  using  2000  and  2500 
training  points  per  period  for  the  5  and  10-dimensional  problems,  respectively.  We  also 
assigned  65%,  25%  and  10%  of  the  training  points  for  the  time  T  -  I  neural  network  to 
the  training,  validation  and  test  sets  in  turn.  This  network  was  trained  a  maximum  of 
5  times  and  stopped  as  soon  as  the  mean-squared  error  on  the  test  set  was  less  than  1 
cent.  The  remainder  of  the  neural  networks  were  trained  only  once,  using  the  solution 
of  the  time  t  +  1  network  as  the  starting  point  for  training  the  time  t  network.  For 
these  networks,  70%  of  the  training  points  were  assigned  to  the  training  set  with  the 
remainder  assigned  to  the  validation  set. 

The  lower  bound  was  obtained  by  simulating  the  approximate  exercise  strategy  along 
8  million  sample  paths.  The  upper  bound  was  computed  using  1,000  sample  paths  to 
estimate  the  expectation  in  (28). 
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5.1  Call  on  the  Maximum  of  5  Assets 

We  assume  that  there  are  5  assets,  r  =  0.05,  T  =  3  years  and  k  =  100.  We  let  5,  =  0.1, 
(7,  =  0.2  and  p,j  =  0  for  i  ^  j.  All  stocks  are  assumed  to  have  the  same  initial  price  Sq. 
The  results  are  given  in  Table  1  for  problems  where  there  are  10,  25,  50  and  100  time 
periods. 

It  can  be  seen  that  the  estimated  lower  and  upper  bounds  are  typically  very  close, 
thereby  providing  very  accurate  estimates  of  the  true  price.  While  it  is  true  that  the 
duality  gap  widens  with  the  number  of  exercise  periods  it  does  so  quite  gradually  so  that 
even  for  the  problems  with  100  exercise  periods,  we  can  still  obtain  very  good  estimates. 
As  we  argued  before,  this  widening  of  the  duality  gap  is  typically  due  to  the  gradual 
deterioration  of  the  upper  bound  as  the  number  of  exercise  periods  increases.  We  will 
see  further  evidence  to  support  this  when  we  examine  options  on  the  geometric  mean. 

In  Table  1  we  also  report  the  prices  of  the  corresponding  European  options  which 
allow  us  to  compute  the  early  exercise  premia  of  the  American  options.  We  see  that 
the  duality  gap  is  approximately  1%  of  the  early  exercise  premium  for  options  with  25 
exercise  periods  or  less.  For  problems  with  as  many  as  50  or  100  exercise  periods,  the 
duality  gap  is  between  1%  and  3%.  Using  the  midpoint  of  the  lower  and  upper  bounds 
should  therefore  enable  us  to  price  the  early  exercise  premium  of  these  options  to  within 
1%  or  2%.  Even  more  accurate  price  estimates  could  be  obtained  by  noting  that  the 
lower  bound  is  usually  closer  to  the  true  price  than  the  upper  bound. 

5.2  Call  on  the  Geometric  Mean  of  5  Assets 

We  assume  that  there  are  5  assets,  r  =  0.03,  T  =  1  years  and  k  =  100.  We  let  J,  =  0.05, 
a,  =  0.4  and  pij  =  0  for  i  ^  j.  All  stocks  are  again  assumed  to  have  the  same  initial 
price  So-  The  results  are  given  in  Table  1  for  problems  where  there  are  10,  25,  50  and 
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100  time  periods. 

For  tlie  American  call  option  on  the  geometric  mean  of  a  collection  of  stocks  the  true 
price  of  the  option  can  be  computed  using  a  standard  binomial  tree,  since  the  stochastic 
process  that  describes  the  evolution  of  the  geometric  mean  is  itself  a  geometric  Brownian 
motion.  We  therefore  report  the  true  price  of  the  American  options  on  the  geometric 
mean  together  with  our  niunerical  results  in  Table  2. 

The  results  are  similar  to  those  in  Table  1,  though  the  duality  gap,  measured  as  a 
percentage  of  the  early  exercise  premium,  now  tends  to  be  somewhat  wider  than  before 
for  the  options  that  start  out  of  the  money.  Again,  this  duality  gap  increases  with  the 
number  of  exercise  periods  though  at  a  very  gradual  pace. 

5.3      Call  on  the  Mciximuni  of  10  Assets 

We  make  the  same  assumptions  for  the  call  option  on  the  maximum  of  10  assets  as  we 
did  for  the  5  asset  case  except  now  T  =  I  year.  The  results  are  displayed  in  Table 
3.  Measured  in  absolute  terms,  the  duality  gap  is  again  very  small  for  these  problems. 
However,  measured  as  a  percentage  of  the  early  exercise  premium,  the  duality  gap, 
though  still  quite  small,  can  be  as  large  as  20%.  This  could  be  due  to  a  number  of 
reasons. 

First,  the  early  exercise  premium  now  represents  a  much  smaller  component  of  the 
overall  option  vahie  than  before,  so  that  using  the  duality  gap  (measured  as  a  percentage 
of  the  early  exercise  premium)  to  measure  performance  is  likely  to  accentuate  pricing 
errors.  This  problem  could  possibly  be  overcome  by  approximating  the  early  exercise 
premium  rather  than  the  continuation  value  of  the  option,  as  mentioned  earlier. 

The  more  likely  reason  for  the  wider  duality  gap  is  that  an  insufficient  number  of 
training  points  or  neurons  were  used.  Indeed,  while  we  doubled  the  dimensionality  of 
the  state  space,  we  made  only  a  moderate  increase  in  the  number  of  training  points 
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and  did  not  increase  at  all  the  number  of  neurons.  As  a  result,  we  expect  performance 
to  suffer.  Consider,  for  example,  the  50  exercise  period  problem  with  So  =  $90.  The 
estimated  lower  bound  for  this  problem,  $15,181,  is  greater  than  the  estimated  lower 
bound,  $15,178,  for  the  corresponding  100  period  problem.  Since  we  know  that  the  true 
option  price  for  the  100  period  problem  is  greater  than  the  true  option  price  for  the 
corresponding  50  period  problem,  it  is  clear  that  the  initial  option  price  approximation 
for  the  100  period  problem  is  inferior.  Since  the  upper  bound  is  quite  sensitive  to  the 
initial  approximation,  we  are  not  surprised  to  see  that  the  duality  gap  for  the  100  period 
problem  is  considerably  wider  than  the  duality  gap  for  the  50  period  problem. 

When  we  resolved  the  100  period  problem  using  4,000  training  points  per  period 
and  25  neurons,  the  estimated  lower  bound  increased  to  $15,196,  while  the  upper  bound 
decreased  to  $15,228.  The  resulting  duality  gap  is  only  3  cents  which,  for  all  practical 
purposes,  is  very  tight. 

6     Conclusions 

In  this  paper  we  have  developed  a  new  method  for  pricing  and  exercising  American 
options.  Our  approach  is  based  on  approximate  dynamic  programming  using  nonlinear 
regression  to  approximate  the  option  price.  Our  main  theoretical  result  is  a  representa- 
tion of  the  American  option  price  as  a  solution  of  a  dual  minimization  problem.  Based 
on  this  dual  characterization  of  the  price  function,  we  use  Monte  Carlo  simulation  to 
construct  tight  upper  and  lower  bounds  on  the  option  price.  These  bounds  do  not  rely  on 
a  specific  approximation  algorithm  and  can  be  used  in  conjunction  with  other  methods 
for  pricing  American  options.  We  characterize  the  theoretical  worst-case  performance  of 
the  pricing  bounds  and  show  that  they  are  very  accurate  on  a  set  of  sample  problems 
where  we  price  call  options  on  the  maximum  and  the  geometric  mean  of  a  collection 
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of  stocks.  These  numerical  results  suggest  that  our  pricing  method  can  be  successfully 
applied  to  problems  of  practical  interest. 
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A     Proofs 

A.l      Proof  of  Theorem  2 

Simplifying  (8)  and  (9),  and  using  Theorem  1,  we  obtain 


Vo  =  V'o  +  Eo 


max  I  ^; —  -^  +  >    Ej- 


teT   \Bt       Bt 


j=i 


(Al) 


as  an  upper  bound  on  tlie  price  of  the  American  option.  We  then  have 
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where  the  second  inequaUty  is  due  to  the  supermartingale  property  of  the  discounted 
option  price  process,  Vf,  and  the  last  step  follows  from  the  triangle  inequality.  The  result 
of  Theorem  2  then  follows. 

A. 2     Proof  of  Theorem  3 

At  time  t,  the  following  six  mutually  exclusive  events  are  possible:    (i)  Qt  <Qt  <  ht, 

(ii)  Qt<Qt<  ht,   (iii)  Qt  <  ht  <  Qt.   (iv)  Qt  <  ht  <  Qt,   (v)  ht  <  Qt  <  Qt,   (vi)  Ih  <  Qt  <  Qt- 

We  define  f,  =  min{s  €  [f,  T]  n  T  :Q~,  <  /ij  and 


Vt  =  BtEt 


For  each  of  the  six  scenarios  above,  we  establish  a  relation  between  the  lower  bound  and 
the  true  option  price. 


(i),(ii)  The  algorithm  for  estimating  the  lower  bound  correctly  prescribes  immediate 
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exercise  of  the  option  so  that  Vt—  y^  =  0. 

(ill)  In  this  case  the  option  is  exercised  incorrectly.    ]4  =  ht  and  Vt  =  Qt  implying 

Vt-V<  \Qt-Qt\. 

(iv)  In  this  case  the  option  is  not  exercised  though  it  is  optimal  to  do  so.  Therefore 

Bt 


Vt  =  -^Et  \VtJ 


while 


Bt 


Vt  =  ht<Qt+  {Qt-  Qt)  =  -^Et  [Vt+,]  +[Qt-Q 
This  implies 

Vt-V<\Qt-Qt\  +^E,  \Vt+x  -  Vt+i] . 

(v),(vi)  In  this  case  the  option  is  correctly  left  unexercised  so  that 

Vt-Vt  =  -^Et  \Vt+i  -  Vt+i]  . 
Therefore  by  considering  the  four  possible  scenarios,  we  find  that 
Vt-Vt<  \Qt  -  Qt\  +^Ef  \Vt+i  -  Vt+x]  . 
Iterating  and  using  the  fact  that  Vr=    Vt  implies  the  result. 
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B     Low  Discrepancy  Sequences 

A  low  discrepancy  sequence  is  a  deterministic  sequence  of  points  that  is  evenly  dispersed 
in  some  fixed  domain.  Often,  and  without  loss  of  generality,  we  take  this  domain  to  be  the 
unit  cube  [0, 1]''.  Because  the  points  in  a  low  discrepancy  sequence  are  evenly  dispersed, 
they  are  often  used  to  numerically  integrate  some  function  /(.)  over  [0,  Ij'^so    that 

/■       f{x)dx^^^^l^  (Bl) 

where  {y,  :i  =  1, . . . ,  A^},  is  a  set  of  A^  consecutive  terms  from  the  low  discrepancy  se- 
quence. An  important  property  that  low  discrepancy  sequences  possess  is  that  as  new 
terms  are  added,  the  sequence  remains  evenly  dispersed.  This  property  implies  that, 
in  contrast  to  other  numerical  integration  schemes,  the  term  A'^  in  (Bl)  need  not  be 
determined  in  advance  and  can  therefore  be  chosen  according  to  some  termination  cri- 
terion. Because  these  sequences  are  evenly  dispersed,  their  use  in  numerical  integration 
often  results  in  a  rate  of  convergence  that  is  much  faster  than  Monte  Carlo  simulation 
where  the  convergence  rate  is  0(-^).  For  the  technical  definition  of  a  low  discrepancy 
sequence  and  a  more  detailed  introduction  to  their  properties  and  financial  applications, 
see  Boyle,  Broadie  and  Glasserman  (1997).  See  Birge  (1994),  Joy,  Boyle  and  Tan  (1996) 
and  Paskov  and  Traub  (1995)  for  some  of  these  applications. 

In  this  paper  we  use  low  discrepancy  sequences  for  training  point  selection  and  train- 
ing point  evaluation.  The  low  discrepancy  sequences  are  of  particular  value  for  training 
point  evaluation  since  a  good  estimate  of 


E, 


_Bt 
B 


^max(M.Yt+i),Q,4-i(A-,+i)) 


can  usually  be  computed  much  faster  by  using  a  low  discrepancy  sequence  in  place  of 
Monte  Carlo  simulation. 
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Even  though  a  low  discrepancy  point  y  G  [0,  1]"^  is  deterministic  it  can  be  useful  to 
interpret  it  as  being  sampled  from  a  uniform  distribution  in  [0,  1]''.  With  this  in  mind, 
it  is  then  straightforward  to  convert  y  into  a  point,  x,  that  is  representative  of  a  random 
variable  Z  with  cumulative  distribution  function  F(.).  For  example,  suppose  Z  is  a 
d-dimensional  standard  normal  random  variable  with  correlation  matrix  equal  to  the 
identity.  We  can  then  construct  a  point  2  6  5R'^  that  is  representative  of  Z  by  setting 

z  =  F-\y)  (B2) 

where  the  operation  F~^  is  taken  componentwise  in  (B2).  If  instead  we  wish  to  'simulate' 
a  multivariate  normal  random  variable,  X  with  covariance  matrix  E,  then  we  simply 
premultiply  z  by  the  Cholesky  decomposition  of  S. 

In  financial  applications  random  variables  are  often  assumed  to  be  lognormally  dis- 
tributed. Since  transforming  a  normal  random  variable  into  a  lognormal  random  vari- 
able is  easy,  however,  we  can  do  likewise  with  z.  We  can  therefore  easily  convert  a 
d-dimensional  low  discrepancy  sequence  into  a  sequence  of  points  that  is  representative 
of  a  d-dimensional  multivariate  log-normal  distribution.  The  low  discrepancy  sequences 
we  use  in  this  paper  are  Sobol  sequences. 
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Table  1:  Call  on  the  mciximum  of  5  assets 

Table  1  contains  estimates  of  the  price  of  an  American  call  option  on  the  maximum  of  5  assets. 
We  use  the  following  set  of  parameter  values:  r  =  0.05,  T  —  3,  k  —  100,  6^  —  0.1,  a^  —  0.2 
and  pij  =  0  for  i,j  =  1,  ...,5.  All  stocks  are  assumed  to  have  the  same  initial  price  Sq.  The 
columns  "Lower  Bound"  and  "Upper  Bound"  contain  estimates  of  the  lower  and  upper  bounds, 
respectively.  The  standard  errors  of  these  estimates  are  given  in  brackets.  Results  are  displayed 
for  problems  with  10,  25,  50  and  100  time  periods.  We  report  estimated  options  prices  in  $'s 
and  their  standard  errors  in  cents. 

So      Lower  Bound     Upper  Bound     European  Price 


90 
100 
110 

90 

100 

no 

90 

100 

no 

90 

100 

no 


10  exercise  periods 


16.640 

16.658 

14.586 

(0.57) 

(0.49) 

26.151 

26.177 

23.052 

(0.68) 

(0.46) 

36.758 

36.826 

32.685 

(0.77) 

(1.48) 

25  exercise  periods 


16.866 

16.889 

14.586 

(0.56) 

(0.67) 

26.465 

26.511 

23.052 

(0.67) 

(1.31) 

37.172 

37.207 

32.685 

(0.75) 

(0.99) 

50  exercise  periods 


16.949 

16.989 

14.586 

(0.55) 

(1.57) 

26.555 

26.656 

23.052 

(0.67) 

(2.85) 

37.280 

37.350 

32.685 

(0.75) 

(1.59) 

100  exercise  periods 


16.962 

17.030 

14.586 

(0.56) 

(2.18) 

26.611 

26.666 

23.052 

(0.66) 

(1.84) 

37.332 

37.442 

32.685 

(0.75) 

(2.47) 
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Table  2:  Call  on  the  geometric  mean  of  5  assets 

Table  2  contains  estimates  of  the  price  of  an  American  call  option  on  the  geometric  mean  of 
5  assets.  We  use  the  following  set  of  parameter  values:  r  —  0.03,  T  =  1,  k  =  100,  6i  —  0.05, 
CT(  =  0.4  and  pij  —  0  for  i,j  —  1, ...,  5.  All  stocks  are  assumed  to  have  the  same  initial  price  Sq. 
The  columns  "Lower  Bound"  and  "Upper  Bound"  contain  estimates  of  the  lower  and  upper 
bounds,  respectively.  The  standard  errors  of  these  estimates  are  given  in  brackets.  Results 
are  displayed  for  problems  with  10,  25,  50  and  100  time  periods.  We  report  estimated  options 
prices  in  $'s  and  their  standard  errors  in  cents. 

Sq      Lower  Boundd     Upper  Bound     True  Price     European  Price 
10  exercise  periods 


90 

1.359 
(.13) 

1.366 

(.08) 

1.359 

1.172 

100 

4.282 
(.21) 

4.292 
(.08) 

4.282 

3.445 

110 

10.177 
(.25) 

10.188 
(.12) 

25  exercise 

10.179 
periods 

7.521 

90 

1.380 
(0.13) 

1.390 
(0.14) 

1.381 

1.172 

100 

4.340 
(0.21) 

4.353 

(0.14) 

4.342 

3.445 

110 

10.363 
(0.21) 

10.374 
(0.19) 

50  exercise 

10.365 
periods 

7.521 

90 

1.387 
(0.13) 

1.396 

(0.19) 

1.388 

1,172 

100 

4.360 
(0.20) 

4.376 
(0.21) 

4.361 

3.445 

110 

10.409 
(0.20) 

10.427 

(0.27) 

100  exercise 

10.411 
periods 

7.521 

90 

1.389 

(0.12) 

1.400 
(0.52) 

1.391 

1.172 

100 

4.368 
(0.20) 

4.388 
(0.30) 

4.371 

3.445 

110 

10.422 
(0.19) 

10.464 

(0.38) 
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10.431 

7.521 

Table  3:  Call  on  the  maximum  of  10  assets 

Table  3  contains  estimates  of  the  price  of  an  American  call  option  on  the  maximum  of  10 
assets.  We  use  the  following  set  of  parameter  values:  r  =  0.05,  T  =  I,  k  —  100,  Sj,  =  0.1, 
ai  =  0.2  and  pij  =  0  for  i,  j  =  1, ...,  5.  All  stocks  are  assumed  to  have  the  same  initial  price  So- 
The  columns  "Lower  Bound"  and  "Upper  Bound"  contain  estimates  of  the  lower  and  upper 
bounds,  respectively.  The  standard  errors  of  these  estimates  are  given  in  brackets.  Results 
ai'e  displayed  for  problems  with  10,  25,  50  and  100  time  periods.  We  report  estimated  options 
prices  in  $'s  and  their  standard  errors  in  cents. 

5o      Luwer  Bound     Upper  Bound     European  Price 


10  exercise  periods 

90            15.082  15.092                   14.747 

(0.40)  (0.37) 

100          26.860  26.863                  26.403 

(0.46)  0.60) 

110           39.076  39.076                  38.522 

(0.51)  (0.48) 

25  exercise  periods 

90            15.158  15.172                   14.747 

(0.39)  (0.51) 

100           26.960  26.970                  26.403 

(0.45)  (0.74) 

110          39.182  39.193                  38.522 

(0.50)  (0.64) 

50  exercise  periods 

90            15.181  15.216                   14.747 

(0.38)  (0.79) 

100           26.978  27.092                   26.403 

(0.45)  (3.46) 

110           39.225  39.261                   38.522 

(0.49)  (0.96) 

100  exercise  periods 

90            15.178  15.283                   14.747 

(0.39)  (1.36) 

100           26.996  27.070                  26.403 

(0.45)  (1.17) 

110           39.223  39.306                  38.522 


15.178 

15.283 

(0.39) 

(1.36) 

26.996 

27.070 

(0.45) 

(1.17) 

39.223 

39.306 

(0.50) 

(2.01) 
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